Home > Publications database > Management of Electrophysiological Data & Metadata - Making complex experiments accessible to yourself and others |
Book/Dissertation / PhD Thesis | FZJ-2018-02625 |
2018
Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag
Jülich
ISBN: 978-3-95806-311-2
Please use a persistent id in citations: http://hdl.handle.net/2128/18468 urn:nbn:de:0001-2018050924
Abstract: As neuroscientists, we are obligated to guarantee the reliability of our research workflows and results. For this reason, most neuroscientific, peer-reviewed journals require, besides a full description of the new scientific findings and used methods, at least also a brief summary of the used data. For a completely open scientific inquiry and to further promote scientific progress as required by most national funding agencies, it would be best to share the raw data along with an analysis paper or independently as a standalone data publication. Unfortunately, especially the neuroscientific community is hesitant to share own research data with third parties, because guidelines, tools and support for the publishing authors to provide data and corresponding adequate information on the experiment are generally hard to formalize and often missing. As a consequence, published scientific results remain unreproducible by other researchers. Along an example of a complex electrophysiological experiment which was conducted by external collaboration partners, I will demonstrate how to share and publish data, but also identify the reasons why researchers, in particular experimentalists in electrophysiology, are hesitant to try the same. For this, I first provide a data descriptor of the experiment following the guidelines of a pure data journal from the Nature publishing group, called Scientific Data. According to the journal guidelines, I describe all information necessary to be able to understand the setup and workflow of the experiment as well as the minimum information necessary to be able to work with the corresponding datasets. The latter requires the provision of a robust data loading routine. To guarantee the access to the data of the experiment, I implemented a commonly usable loading routine for the data formats of the used data acquisition (DAQ) system from Blackrock Microsystems(Cerebus DAQ), and published it as part of an open source data framework, called Neo. Neo has the advantage of representing data in standardized structures that allow researchers to use common analysis routines on different data formats. In addition, it is possible to further annotate the Neo data structures with experiment-specific information on the data. To automatically integrate such information, termed metadata, it is best to have them organised in a machine-readable format. Although several software solutions for such metadata formats exist, they are usually not tested for complex use cases, such as the example experiment. In most cases, they only provide the framework itself as a standardized metadata representation or specification, and no solutions for how to actually compile auseful metadata collection. For the example experiment, I chose a metadata framework, called open metadata Markup Language (odML), an open source project developed by the German Node (GNode) of the International Neuroinformatics Coordination Facility (INCF). In the second part of the thesis, I demonstrate how to organize metadata for the experiment, to be able to compile and use a corresponding odML metadata collection. To facilitate the compilation process for my collaborators, I developed a Python package, called odMLtables, which facilitates the access to the odML framework by an algorithmic transformation of odML into a spreadsheet format (csv or xls) and back. In addition, I provided a complete workflow for collecting and storing the metadata of the experiment into a comprehensive odML-file collection. Furthermore, I provided a specified data loading routine that automatically annotates the data structures with the corresponding metadata of the collection. The latter improves the workflow in the course of neuroscientific analyses of the data from the example experiment, as demonstrated in the last part of my thesis. In summary, I show that the preparations to properly share research data within a scientific collaboration are cumbersome and time consuming, but essential for successfully publishing data and analysis results for a broader audience of users. To promote data sharing within the neuroscientific community and to provide a better foundation for reproducible research, my thesis offers a coherent strategy for managing electrophysiological data and metadata using a well selected set of available technologies.
The record appears in these collections: |